With many potential predictors we can eliminate some by examining correlation plots

There is weak correlation between SOC_stock_spline and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as NDYI (Normalized Difference Yellow Index) and NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

Model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. lower_depth will be a random slope to adjust model based on how it is affected by depth.

Also lower depth is included as a fixed effect to specify an overall effect of depth in addition to accounting for the random slope within sample_ID locations. See:

We also use a different optimizer for the lmer model which uses nloptwrap by default and gives convergence warnings for multiple model iterations. We use the bobyqa optimizer and have assessed negligiable changes to log liklihood (< 0.1) and parameter estimates (< 0.00001). See: - https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html

We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. We will model with the log-transformed soil carbon stocks in the linear mixed model

Pairwise comparisons between the top, largest model and the rest show if models with fewer parameters are significantly different. Considering AIC will help determine the most parsimonious model with the best fit to the data.

From the ANOVAs it looks like model 7 has the lower AIC when compared together.

Model 7 is the lowest AIC

Next test if there is an interaction

Model assumptions check

Spatial Autocorrelation

Model fit stats and coefficients (R^2, etc)

The model \(R^2\) for mod7 is 0.832

Anova between sites

From Yu et al,.

Correlations among predictors for wetlands, uplands, mesic

  • Partial correlation measures the strength of relationship between two variables (e.g., SOC concentration and MAT) while removing the effects of one or more possibly confounding variables (e.g., geochemical variables) on this relationship. Differences between zero-order (i.e., no other variables have been controlled) and partial correlations indicate the degree of dependency of target predictors (e.g., MAT) on controlled predictors (e.g., geochemical variables).

  • So the interaction between two predictor variables.

GAMMs because of under prediction

Random Forest